Deploying Lucene on the Grid
نویسندگان
چکیده
We investigate if and how open source retrieval engines can be deployed in a grid environment. When comparing grids to conventional distributed IR, the lack of a-priori knowledge about available nodes is one of the most significant differences. On top of that, it is also unknown when a particular node has time and resources available and starts a submitted job. Therefore, conventional methods such as RMI are not directly usable and we propose a different approach, using middleware designed specifically for grids. We describe GridLucene, an extension of the open source engine Lucene with grid-specific classes, based on this middleware. We report on an initial comparison between GridLucene and Lucene, and find a minor penalty (in terms of execution time) for grid-based indexing and a more serious penalty for grid-based retrieval. The used middleware can gather a set of physical resources to form a single logical resource with some abstract properties. The user-definable properties can be used during indexing and retrieval to let GridLucene know which files it needs to access. By using this kind of semantic information, grid nodes can “discover” which indices exist on the grid and which particular documents need to be indexed. GridLucene is available for downloading under the same license as Lucene.
منابع مشابه
CLEF 2009: Grid@CLEF Pilot Track Overview
The Grid@CLEF track is a long term activity with the aim of running a series of systematic experiments in order to improve the comprehension of MLIA systems and gain an exhaustive picture of their behaviour with respect to languages. In particular, Grid@CLEF 2009 is a pilot track that has started to move the first steps in this direction by giving the participants the possibility of getting exp...
متن کاملGrid@CLEF 2009 Track Overview
The Grid@CLEF track is a long term activity with the aim of running a series of systematic experiments in order to improve the comprehension of MLIA systems and gain an exhaustive picture of their behaviour with respect to languages. In particular, Grid@CLEF 2009 is a pilot track that has started to move the first steps in this direction by giving the participants the possibility of getting exp...
متن کاملEvaluation of the Default Similarity Function in Lucene
Lucene [4, 3] is a popular open-source IR toolkit, which has been widely used in many searchrelated applications [5]. However, there was no study on evaluating the retrieval performance of the default retrieval function that is implemented in Lucene. Clearly, an improved retrieval function would enable all the applications based on Lucene such as Nutch to achieve higher search accuracy. Thus it...
متن کاملGrid Application Deployment Kit ?
This report presents a toolkit for deploying applications on Grid environment called Grid Application Deployment Kit. It assists engineers and scientists in solving the problems on the Grid environment effectively based on robust Grid technologies and GridRPC API, a Remote Procedure Call standard interface for Grid-enabled applications. In this report, we also present mechanisms to reduce commu...
متن کاملApplication of Full Text Search Engine Based on Lucene
This paper introduces us the full-text search engine based on Lucene and full-text retrieval technology, including indexing and system architecture, compares the full-text search of Lucene with the String search retrieval’s response time, the experimental results show that the full text search of Lucene has faster retrieval speed.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006